Applying SFC Model for Chinese Expressive Speech Synthesis
نویسندگان
چکیده
This paper presents an approach to model the pitch contour in Chinese expressive speech synthesis by using SFC (Superposition of Functional Contours) model. Some functional contours corresponding to the expressions are introduced when applying SFC for expressive speech. During implementation, both the emotion-dependent method and emotion-independent method are realized and compared. Three emotion types (neutral, happiness and sadness) and stress caused by narrow focus are studied in our experiments. The results show that the RMSE and correlation between predicted F0 and neutral one are satisfactory and the listening tests prove that the synthesized speech using proposed pitch model presents corresponding expressions as expected.
منابع مشابه
Modeling Prosody Pattern of Chinese Expressive Speech and Its Application in Personalized Speech Conversion
This paper proposes an approach for modeling prosody patterns of acoustic features of Chinese expressive speech. In a Chinese multi-syllabic prosodic word, a syllable is identified as the core syllable based on the observation that speaker usually puts more emphasis on such syllable. The variations of the acoustic features migrating from neutral to expressive speech are then analyzed for both t...
متن کاملModeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis
This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...
متن کاملDatabases of Expressive Speech
This paper discusses the construction of speech databases for research into speech information processing and describes a problem illustrated by the case of emotional speech synthesis. It introduces a project for the processing of expressive speech, and describes the data collection techniques and the subsequent analysis of supra-linguistic, and emotional features signalled in the speech. It pr...
متن کاملModeling of Fundamental Frequency Contour of Thai Expressive Speech using Fujisaki’s Model and Structural Model
Problem statement: In spontaneous speech communication, prosody is an important factor that must be taken into account, since the prosody effects on not only the naturalness but also the intelligibility of speech. Focusing on synthesis of Thai expressive speech, a number of systems has been developed for years. However, the expressive speech with various speaking styles has not been accomplishe...
متن کاملReal-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar
This paper describes our initial work in developing a real-time audio-visual Chinese speech synthesizer with a 3D expressive avatar. The avatar model is parameterized according to the MPEG-4 facial animation standard [1]. This standard offers a compact set of facial animation parameters (FAPs) and feature points (FPs) to enable realization of 20 Chinese visemes and 7 facial expressions (i.e. 27...
متن کامل